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Abstract. In this short technical report, we define on the sample space 
a distance between data points which depends on their correlation. 
We also derive an expression for the center of mass of a set of points with 
respect to this distance. 



1 Preliminaries 

For a sample point x = {xi, . . . ,xd) E M^, we define the average 
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and the standard deviation 
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of its components and we set a; = (x, . . . , x). 

We now restrict our attention to MP\Diag where 

Diag = {(xi, . . . , xd) E M'^| xi = ■ ■ ■ = xd}- 

To a? G M.^\Diag, we associate the centered and reduced variable 

X — X I — X — X 

X* = = Vd- 
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Consequently, x* = and ax* = 1, and we have 
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The geometric interpretation of this transform is that x* lies on the 
D-dimensional hypersphere §'^(-\/D) C of radius \/D centered 
at the origin. 

The correlation between two sample points, x = {xi, . . . , xd) and 
2/ = (yi, ■ ■ ■ , yi?), in R^\Diag is given by 

corr{x, y) — 



which can also be expressed as 

/ N {x-x)-{y-y) 1 

corr{x, y) = j —-r 377 = 77(2; • y ) 

\\x — x\\ \\y — y\\ D 

where x ■ y stands for the scalar product of x and y. 

2 A distance based on correlation 

We propose the following correlation-based distance 



d{x, t/) = Vl - {corr{x, y)Y = ^1- (^^^f )' (1) 

for x,y e R^\Diag. Note that < d{x, y) < 1. 
The following properties of a metric distance 

d{x, x) =0 

d{x,y) =d{y,x) 

d{x,z) < d{x,y) +d{y,z), 

must be verified. 
We have 



and, obviously, d{x,y) = d{y,x). 



The main feature of this distance is that strong correlation cor- 
responds to small distance. Indeed, 

[corr{x, y)]'^ — 1 ^^3 /j, 0,5 s. t. Xi — + 5, Vi 
44> cc* = ±y* 
44> d{x,y) = 0. 

which also means that the distance d is degenerate, since d{x, y) = 
X = y. 

The triangle inequality d{x, z) < d(x, y) + d{y, z) requires some 
explanations. A preliminary remark is that 

= sin(Q;) 

where < a < tt is the angle between x* and y*. 

Replacing y* by —y* and z* by —z* if necessary, we can assume 
that the angles a between x* and y* and (3 between y* and z* belong 
to [0, vr/2]. Consider the point z obtained by rotating z* around the 
axis defined by y*, into the plane determined by x* and y* , but 
opposite to X* with respect to y*. The angle between y* and z is 
still 13. However, the angle between x* and z, which equals a + /3, is 
greater than the one between x* and z*. Therefore, 

d{x, z) < sn\{a + (3) = sin(Q;) cos(/?) + sin(/3) cos(q;) 

e[o,i] 6[o,i] 
< sin(Q;) + sin(/3) = d{x, y) + d{y, z) 

As previously mentioned, the distance d is degenerate on M.^\Diag 
or on ^^{y/D). However, we obtain a non-degenerate distance on the 
projective space (i.e. the space of lines through the origin in IR-^). 

3 The center of mass 

Onwards, we will assume that all variables are centered and reduced. 
Hence, we restrict the sample space to the D-dimensional hyper- 
sphere S^(V^) C MP of radius \fD centered at the origin. We will 
omit the * notation. 



We compute the center of mass g E 'B^{^/D) of a set of points 
{xj}jLi on S^(-\/D). By definition, the center of mass minimizes 
the average square distance to a set of points. We therefore want to 
minimize the expression 
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under the constraint 

Hig) = l-^g-g = (3) 

that g lies on §^(a/D). 

We solve this problem using the method of Lagrange multipliers. 
The gradients of F and H must satisfy 

VF{g) = XVH{g) , 

or equivalently 

(,) = (4) 
Equation (jl)) can be rewritten as 

1 ^ 1 ^ f ^ 
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If we define the D x D matrix M = {rriik) by 
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TTT-ik — n ^ ] XjkXjj [i, k — 1, . . . , D) , 



then equation (jSJ becomes 
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^rnikgi = >^gk (/c = 1, . . . ,-D) 
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or equivalently 

Mg = \g 

Thus, minimizing F (eq. Ej) under the constraint H (eq. ^ reduces to 
finding the eigenvectors of M. The eigenvector, correctly normahzed 
in order to satisfy H, for which F is minimum, yields the center 
of mass of the set of points {xj}jLi on §^(V^). The matrix M 
being symmetric, all its eigenvalues are real. 



